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This research focused on testing with maize, economical crop grown in 
Phetchabun province, Thailand, by installing a total of 20 sets of internet of 
things (IoT) devices which consist of soil moisture sensors and temperature 
and humidity sensors (DHT11). Data science tools such as rapidminer studio 
was used for data cleansing, data imputation, clustering, and prediction. 
Next, these data would undergo data cleansing in order to group them to 
obtain optimization clustering to identify the optimum condition and amount 
of water required to grow the maize through k-mean technique. From the 
analysis, the optimization result showed 3 classes and these data were further 
analyzed through prediction to identify precision. By comparing several 
algorithms including artificial neural network (ANN), decision tree, naive 
bayes, and deep learning, it was found that deep learning algorithm can 
provide the most accurate result at 99.6% with root mean square error 
(RMSE)=0.0039. The algorithm obtained was used to write function to 
control the automated watering system to make sure that the temperature and 
humidity for growing maize is at appropriate condition. By using the 
improved watering system, it improved the efficacy of watering system 
which saves more water by 13.89%. 
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1. INTRODUCTION 


Artificial intelligence (AI) is a technology to enable machines, computers, and statistical tools and 
equipment to create software that can imitate human capabilities especially on the very complex tasks e.g., 
memories, classification, reasoning, decision, prediction, and even communication with human beings, all 
through algorithms. In some cases, AI can be improved through self-learning which consists of 3 levels: 
machine learning [1], machine intelligence [2], and machine consciousness [3]. Machine learning is one of 
AI capabilities which the machine can learn on its own. 

This research used a combination of internet of things (IoT), big data, and AI technology and 
integrated into agricultural system as a more effective alternative to help solve existing problems the farmers 
are facing. This could improve the speed, increase crop yields, and enable more effective use of natural 
resources for the users, or farmers which are the main objectives for this study. This research integrateds AI, 
IoT, and big data with 3 main actions: i) to analyze the classification of data to find optimum amount of 
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water required, ii) to compare the algorithm of watering system, and iii) to improve the automated watering 
system based on the algorithm obtained. 


2. THEORETICAL BACKGROUND AND RELATED RESEARCH 
2.1. Artificial intelligence (AT) 

AI technology works in between automated and intelligence system, it is crucial that the data 
obtained must be analyzed with the appropriate data classification, data types, and the relationship of the 
information from big data to use this analysis to decide instead of using human judgement to decide. This 
type of classification is suitable to create predictive modeling called supervised learning [4]. This type of 
classification of the data is to create model in grouping the data into assigned groups by create sample groups 
with data in advance and predict the data groups that have not been classified. The sample groups might be in 
a form of artificial neural network (ANN) [5], decision tree [6], naive bayesian, and deep learning. This 
research compares the algorithms for the best efficiency. 


2.1.1. Artificial neural network (ANN) 

ANN is a type of self-learning similar to human brain system. Process involves bringing neural 
networks into the prediction which requires keying in the data to build simulation in data prediction in the 
future [7]. The artificial neural network will try to predict based on the built simulation to make sure there’s 
as little error as possible, with the (1): 


n = Vier = Xiwi +b (1) 
n=sum of function, xj=input i, w=weight i, z=number of input layer, b=bias, i=1 to z. 


2.1.2. Decision tree (DT) 

Decision tree is a type of self-learning through mathematics to identify the best choice by creating data 
prediction in a form of tree structure which is learned through supervised learning. This enables clustering 
through training data set automatically and enable the group prediction of the data that has not been grouped 
before. In [8] the rank widget scores the attributes according to their correlation with the class. Attribute scoring 
methods that can be used in rank widget are information gain, information gain ratio and gini [9], [10]. 


2.1.3. Naive bayesian (NB) 

Naive Bayesian (NB) is a type of learning through probability based on bayes theorem with a more 
straight forward, non-complex algorithm. It is a type of process in classifying information by learning from 
the problems occurred and utilize these conditions for re-classifying the data. It is a type of data classification 
that uses probability and computation to classify based on the hypothesis created with the data. The new 
calculated models will be used to adjust and re-classified which could either increase or reduce the 
probability of the information [11]. New information generated and set samples are then adjusted in 
combination with existing data, with the (2): 


P(BIA)xP(A) 
P(B) 


P(A|B) = (2) 


where A and B are the events and P(B) # 0 

P(A|B) the likelihood of event A occurring given that B is true. 

P(B|A)the likelihood of event B occurring give that A is true. 

P(A) and P(B) are probabilities of observing A and B independently of each other. 


2.1.4. Deep learning 

Deep learning is a type of automated learning that imitates the function of human neural networks 
(neurons) by overlapping these neural networks into several layers and create learning of the sample data. 
These data mentioned will be used to detect patterns or arrange into category or classify the data. This use a 
multi layer feed forward neural network which uses back propagation for building the deep learning 
environment [12]. 


2.1.5. Algorithm performance 


After that, testing data is conducted to evaluate the accuracy of the model (model evaluation) to be 
used to analyze the effectiveness of the algorithm operation obtained. In this research, the algorithm 
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performance metrics used for experiment is accuracy and the evaluation of the algorithm operation is based 
on root mean square error (RMSE). Accuracy, with the (3): 


TP+TN 
Accuracy = (—" ) x 100 5 
RMSE, with the (4): 
RMSE = | Cea eee , 


2.2. Internet of thing (IoT) 

IoT is a paradigm which connects sensors to the internet. This enables us to order and control the 
operation of different equipment [13] or circuit which detect the changes and showing the results as signals. 
[14] sensor samples such as humidity sensors, temperature sensors, immersion sensors, vibration sensors, 
current leakage sensors, and intelligent video sensors through smart phones or computer. This process then 
creates a smart system such as smart agriculture, smart device [15], smart grid [16], smart home, smart city 
[17], or smart transportation. 


2.3. Maize 

Maize has tall stems which can range from 60 cm up to 6meters tall depending on the species. The 
diameter of the stem ranges from 1.27-5.08 cm. It takes approximately 100-120 days to be fully grown [18]. 
It is considered one of the economic crops in Thailand. In 2019-2020, it was found that the land for growing 
these crops throughout the country were declining. This is due to the fact that the land for growing these 
crops were left empty and farmers were unable to grow these crops due to drought from delays of rain and 
insufficient amount of natural water reserve. While the growing or crop is declining, there are still increasing 
demands of the maize in the country especially in farming business where maize is the main ingredient to 
make animal food for up to 7.41 million tons per year, while the annual supply is only 4.62 million tons [19]. 


2.4. Literature review 

From Table | the review of all papers found that the fundamental of IoT has connecting physical 
objects, sensors, and other smart technologies altogether into one system. IoT is being utilized to view sensor 
values via the internet. Compilation of data obtained from IoT is used for analysis, clustering, classification, 
or prediction. Pulling these important data, big data, from several sensors from edge computing and data pre- 
processing allow information from data collection to be classified and completed. These data can either be 
with structure or without structure. 


Table 1. Literature review methodology 


Methodology Subject Reference 

- Cycle of smart farming. IoT [13]-{15] 
- Sensing and monitoring: robotics and sensors (temperature, humidity, and CO2), [17] 

greenhouse computers. [20]-[22] 


- Analysis and planning: seeding, planting, soil typing, crop, health, yield modelling, 
lighting, energy, and management. 
- Control: precision farming, climate control. 
- oT definitions, technologies, and applications. 
= IoT solution, devices, platform. 
- Data collection: capture, storage, and transfer of data Big data [23], [24] 
- Data pre-processing: transformation, marketing, and analysis of Big Data, Big Data 
greening, caching, remote sensing, real-time, analysis. 
- Choosing patten for AI; machine learning, machine intelligence, and machine Al [2], [3] 
consciousness [25]-[28] 


The data classification methodology, it will be divided into 2 data sets. First, training set is a set of 
data used to train AI. This type of data is used to find variable which help the model function properly. While 
test set can measure the efficiency before bringing them to build AI. After that, choosing pattern for AI is 
conducted to select the appropriate algorithm. 
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3. RESEARCH METHOD 
From Figure 1 showed the steps to find algorithm is as follows: 


3.1. IoT 

Based on previous research, devices for the router node were installed which consists of Arduino 
Uno R3, soil moisture sensor, temperature and humidity sensor (DHT11), NODEMCU ESP8266 module, 
and battery pack as one set of installation. The entire research required 20 sets of the mentioned system to 
cover 1,200 square meters of land for the experiment with one set of the device with coordinator node, 
consisted of Arduino Uno R3, NODEMCU ESP8266, and water pump, shown in Figure 2. Each type of 
sensors receives the data in a form of analog and digital which required python language to improve by 
collecting data from the sensors through arduino board. Arduino board was used to read and deliver the data 
to cloud system through web service. Cloud system received the data and record them into database, firebase, 
by collecting, separating the data based on different measurements such as humidity and temperature from 
soil moisture sensor and DHT11. This information was passed through a wireless model NODEMCU 
ESP8266. The humidity and temperature sensors operated by having NODEMCU ESP8266 linked with wifi, 
sending the information through to firebase with data collection conditions to capture at 8 different times 
which was pre-designed since the start of the programming of arduino ide. 


Step 2: Big Data 


Sensor 
system 
and IoT 


Water pump 


Figure 2. The prototype IoT and design of hardware sensors for maize area 


3.2. Big data 

Once data was received from the sensor (in step 1), application programming interface (API) was 
designed to receive the data from the devices and record them into the database by collecting them into the 
cloud computing. After that, improvement of cloud system was started by building the database for collecting 
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humidity and temperature from each station. Data was then pulled from cloud system to analyze the 
appropriate value to predict the right amount of water to grow the crops in the future. 


3.3. AI 

From Figure 3, the steps to finding best algorithm for prediction. There are several tools available 
for the application of machine learning algorithms to data cleansing, imputation data, clustering, and 
prediction data. After a review of data science tools, RapidMiner is one of the best tools for data science and 
machine learning because it allows extremely fast and easy in data analytics [29]. 

Sensors captured the data every day for 30 consecutive days at 8 different time points: 1 am, 4 am, 
7 am, 10 am, | pm, 4 pm, 7 pm, and 10 pm. In total, there will be 20 data sets x 8 different time points x 
30 days = 4,800 records. There might be some data lost when collecting from the sensors, so this research 
used cleansing and imputation with deep learning algorithm and assigned rectified linear unit (ReLU) 
technique to collect temperature and humidity data. 


DHT11 sensor 


Arduino board 


Firebase cloud 


Data cleansing Clustering K-Mean 


a beeen ecient 
Customize water the plants 


| Prediction algorithm Embedded code 


Figure 3. Steps for predictive model 


After that, classifying the quality of the plantation with attributes such as time, station, temperature, 
humidity, and soil moisture were then tested with the k-mean clustering technique to identify the appropriate 
cluster. The result showed that the most appropriate cluster is cluster 3 where the elbow point and the best 
point from average within centroid distance is 34.643 and from davies bouldin is 0.959 as shown in Table 2. 
After all 3 points were identified, different classes were created by giving high, medium, and no water to the 
crops. 


Table 2. Result of optimize cluster 
Cluster Davies bouldin _ Average within centroid 


2.0 1.138 54.710 
3.0 0.959 34.643 
4.0 1.023 29.628 
5.0 1.093 25.188 


Next, these 3 different classes were used in the experiment to provide the appropriate amount of 
water and monitor by using several sensors consisted of soil moisture sensor, DHT11 sensors to measure the 
temperature and soil moisture. It was found that for the class that require high amount of water, it needs a 
total of 15 minutes to water them, while 10 minutes is required for the class with medium amount of water. 
These conditions would create the soil that is moist enough to provide the appropriate condition for 
plantation. By analyzing the data with data selection, training data and test data were used with cross 
validation method to compare each model and create optimized parameter to find the most appropriate value 
in each model. From Table 3, the results from optimization of artificial neural network showed that the most 
appropriate value that was used to compare with other algorithms is the value of number of folds at 7 folds 
and training cycle at 18 cycles with accuracy at 99.37% and RMSE value=0.0166. The results from 
optimization of decision tree showed that the most appropriate value that was used to compare with other 
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algorithms is the value of number of folds at 7 folds and gain ratio as criterion with the accuracy at 99.35% 
and RMSE=0.0053. 


Table 3. Results from the algorithm comparison of the analysis of data on humidity and temperature 


Algorithm Accuracy _ RMSE 
Artificial neural network 99.37% 0.0166 
Decision tree 99.35% 0.0053 
Naive bayesian 97.46% 0.0338 
Deep learning 99.60% 0.0039 


The results from optimization of naive bayes showed that the most appropriate value that was used 
to compare with other algorithms is the value of number of folds at 11 folds with accuracy at 97.46% and 
RMSE=0.0338. The results from optimization of deep learning showed that the most appropriate value that 
was used to compare with other algorithms is the value of number of folds at 9 folds with accuracy at 99.60% 
and RMSE=0.0039. From the optimization, the best model was then created by looking at the accuracy and 
root mean square error, RMSE. The results showed that deep learning algorithm provided the highest 
accuracy at 99.60% with root mean square error at 0.0039. From Figure 4, the design and improvement of the 
automated watering system was divided into 2 main parts: 


NodeMCU ESP8266 NodeMCU ESP8266 


Internet 
Module 


Arduino Board 
Arduino Board 


| 
| 
| 
| 
| 
Algorithm] 
| 
| 
| 
| 
| 
| 
| 
| 
| 


Figure 4. Data transmission between routers and coordinator 


3.3.1. Router node design 

Router node consisted of Arduino IDE board which connects to two different types of sensors. First 
was soil moisture sensor to measure the level of humidity in soil by installing the sensors at the roots of the 
maize at the 15 cm depth into the soil. Second, humidity and temperature sensor were used to measure the 
temperature. Both sensors indicated the needs of water for the maize through connecting with the wireless 
communication module, NODEMCU ESP8266, to send the sensor signals read from the arduino board to the 
coordinator node. 


3.3.2. Coordinator node 

The coordinator node consisted of two types of micro-controller boards which were arduino IDE 
and Node MCU with NODEMCU ESP8266 as communication module to receive the sensors from router 
node for the input of algorithm controller. This was to control the regulation of turning on and off of the 
water pump. The signal was sent to control the relay coordinator board which connected to the water pump to 
pump water into the watering system in the maize fields. Sensors received from the router nodes along with 
the status of all the pumps and solenoid valves was then sent up to the cloud service to record all the data and 
the status can be checked at monitored at real-time. Coordinator would receive data on temperature and 
humidity from the routers through NODEMCU ESP8266. These data would be analyzed with algorithm that 
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was written in the Arduino board to control the water pump system in releasing the appropriate amount of 
water into the experiment area. 


4. RESULTS AND DISCUSSION 

4.1. Measure the amount of water used to grow maize through automated algorithm control 

a. Measure the amount of water sprinklers in experimentation field with dimension of 60 meters in length 
x20 meters width: growing | row of maize with 60 meters length, distance in between each water 
sprinkler is at 30 centimeters. So, total of water sprinkler required for one row is equal to 6,000/30=200 
sprinklers. Growing total of 25 rows of maize will require a total of 200x25=5,000 water sprinklers for 
the experiment. 

b. Measure the rate of water used per hour: total of 5,000 water sprinklers. Water flow rate is at 2.5 liters 
per hour per water sprinkler. So, the total water flow rate is 5,000x2.5=12,500 liters per hour. 

c. From 30 days of data collection: it was found that the total duration required to water the entire area is 
420 minutes or 6 hours and 30 minutes. So, the water rate is equivalent to 12,500x6.5=81,250 liters per 
month. 


4.2. Calculate the amount of water required to grow the maize through statistical methodology 

Watering the crops through statistical methodology required the calculation of amount of water 
needed for each type of plants with details as follows: evapotranspiration means the total amount of water 
lost from the planting area to the atmosphere in the form of steam which consisted of transpiration and 
evaporation process. The coefficient of water usage in each type of plants is unequal. The same type of plant 
might also have different water usage coefficient depending on the age as well. The water usage coefficient 
of maize can be referred from the data from department of agriculture [30] as shown in Table 4. 


Table 4. Water usage coefficient of maize at different month of age [30] 
Month of age 1 2 3 4 5 6 7 8 9 10 11 12 13 14 
K. 0.63 0.72 086 1.13 1.35 152 161 163 158 150 1.38 1.15 0.90 0.67 


Finding the amount of water usage of the plants referring to ET» can be calculated by relying on the 
statistic of the climate in Thailand from Meteorological Department, Ministry of Digital Economy and 
Society [31]. ET» of the planting area in Phetchabun province [32] was calculated as shown in Table 5. 


Table 5. Amount of water used by plants referred to the planting area in Phetchabun province [31] 
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov___ Dec 
ET, 3.33 4.05 496 5.18 416 369 3.58 3.43 3.22 3.69 3.73 341 


The value then was calculated with this (5): 
ET = K, X ET, (5) 


ET=evapotranspiration, K. =crop coefficient, ET, =potential evapotranspiration. 

The calculation of the amount of water required by maize was based on the duration of the 
experiment in starting in May 2021 which lasted for 1 month. Referring to the Table 4 and 5, K, equals 0.63, 
ET, equal 4.16 mm per day. 


= 0.63 x 4.16 
= 2.6208 mm per day 


The amount of water required to grow maize in a month is equal to 2.6208x30= 78.624 mm. Total amount of 
water needed for 1,200 square meters=1,200 square meters x (78.624/1000 mm)=94.3488 cubic meters per 
month or approximately 94,349 liters per month which when compared to the amount of water used through 
the automated watering controller, it would require significantly less amount of water to grow the plants in 
the same area, with only 13,099 liters required for the automated water control. This can help save up to 
13.89% of water usage within the 1,200 square meter planting area. 
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5. CONCLUSION 

This research used deep learning to identify the missing value which is a popular technique used for 
this type of research for data cleansing and imputation purpose. After that, the data underwent clustering to 
different groups with k-mean technique. Based on the temperature and soil moisture data, 3 classes of 
optimized data were obtained and used to identify the appropriate prediction value. By comparing the best 
accuracy value and RMSE value, it was found that deep learning algorithm resulted in the best value. Once 
algorithm was received, it was then written into the arduino board of the coordinator router to regulate the 
watering system. Sensor data from the router was further analyzed to provide the right amount of water based 
on the class predicted. Results from the experiment showed that after 30 days of testing, the new system can 
save 13.89% water more than the conventional system. For future work, more sensors should be installed to 
collect more information for further analysis such as sunlight sensors and mineral detectors in soil. This 
methodology can be applied to other types of plants and drone system can also be incorporated to collect data 
in a form of videos to analyze the growth of the crops or identify pests as well. 
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